Goto

Collaborating Authors

 gk 2


Randomized Subspace Nesterov Accelerated Gradient

arXiv.org Machine Learning

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.


a376033f78e144f494bfc743c0be3330-Supplemental.pdf

Neural Information Processing Systems

Inthis section, we provide theoretical analysis ofHSPG. Moreover, we further point out that: (1) theSub-gradient Descent Stepwe used to achieve a "close enough" solution canbereplaced byothermethods, and(2)theAssumption 4isonlyasufficientcondition thatwecouldusetoshowthe"closeenough"condition. B.1 RelatedWork Problem (12)has been well studied indeterministic optimization with various algorithms that are capable ofreturning solutions with both lowobjectivevalueandhigh group sparsity under proper λ(95;73;42;64). For example, proximal stochastic variance-reduced gradient method (Prox-SVRG)(88)and proximal spider (Prox-Spider) (97) are developed to adopt multi-stage schemes based on the well-known variance reduction technique SVRG proposed in (46) and Spider developed in (22) respectively. Under Assumption 1, the search directiondk is a descent direction forψBk(xk), i.e., d>k ψBk(xk)<0.


2c8d9636f74d0207ff4f65956010f450-Paper-Conference.pdf

Neural Information Processing Systems

Their method is a novel modification of Goldstein's classical subgradient method. Their work, however,makesuseofanonstandard subgradient oracle model andrequires the function to be directionally differentiable.